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Abstract. This note demonstrates that it is possible to bound the expectation of an arbitrary 
norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a 
standard Gaussian matrix with the same dimensions. A related comparison holds for any convex 
function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed 
inequality is also valid. 



1. Main Result 

Many problems in high-dimensional geometry concern the properties of a random A;-dimensional 
subspace of the Euclidean space M". For instance, the Johnson-Lindenstrauss Lemma |JL84j shows 
that, typically, the metric geometry of a collection of points is preserved when we project the 
points onto a random subspace with dimension 0(log A^). Another famous example is Dvoretsky's 
Theorem |Dvo611 lMil71[ [Bal97] . which states that, typically, the intersection between the unit ball 
of a Banach space with dimension N and a random subspace with dimension 0(log N) is comparable 
with a Euclidean ball. 

In geometric problems, it is often convenient to work with matrices rather than subspaces. 
Therefore, we introduce the Stiefel manifold, 

:= {Q G M"^'^ : Q*Q = I}, 

which is the collection of real nxk matrices with orthonormal columns. The elements of the Stiefel 
manifold Vj! are sometimes called k- frames in M". The range of a /c-frame in M" determines a 
/c-dimensional subspace of M", but the mapping from A;-frames to subspaces is not injective. 

It is easy to check that each Stiefel manifold is invariant under orthogonal transformations on the 
left and the right. An important consequence is that the Stiefel manifold admits an invariant 
Haar probability measure, which can be regarded as a uniform distribution on fc-frames in M". A 
matrix Q drawn from the Haar measure on is called a random k- frame in M". 

It can be challenging to compute functions of a random /c-frame Q. The main reason is that 
the entries of the matrix Q are correlated on account of the orthonormality constraint Q*Q = I. 
Nevertheless, if we zoom in on a small part of the matrix, the local correlations are very weak 
because orthogonality is a global property. In other words, the entries of a small submatrix of Q 
are effectively independent for many practical purposes |Jia06| . 

As a consequence of this observation, we might hope to replace certain calculations on a random 
/c- frame by calculations on a random matrix with independent entries. An obvious candidate is a 
matrix G G M"^^' whose entries are independent N(0, n~^) random variables. We call the associated 
probability distribution on M"^'^ the normalized Gaussian distribution. 

Why is this distribution a good proxy for a random fc-frame in M"? First, a normalized Gaussian 
matrix G verifies E(G*G) = I, so the columns of G are orthonormal on average. Second, the nor- 
malized Gaussian distribution is invariant under orthogonal transformations from the left and the 
right, so it shares many algebraic and geometric properties with a random fc-frame. Furthermore, 
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we have a wide variety of methods for working with Gaussian matrices, in contrast with the more 
hmited set of techniques available for dealing with random A;- frames. 

These intuitions are well established in the random matrix literature, and many authors have 
developed detailed quantitative refinements. In particular, we mention Jiang's paper |Jia06j and 
its references, which discuss the proportion of entries in a random orthogonal matrix that can 
be simultaneously approximated using independent standard normal variables. Subsequent work 
by Chatterjee and E. Meckes |CM08j demonstrates that the joint distribution of k (linearly inde- 
pendent) linear functionals of a random orthogonal matrix is close in Wasserstein distance to an 
appropriate Gaussian distribution, provided that k = o{n). 

We argue that there is a general comparison principle for random /c-frames and normalized 
Gaussian matrices of the same size. Recall that a convex function is called sublinear when it is 
positive homogeneous. Norms, in particular, are sublinear. Theorem [1] ensures that the expectation 
of a nonnegative sublinear function of a random A;-frame is dominated by that of a normalized 
Gaussian matrix. This result also allows us to study moments and, therefore, tail behavior. 

Theorem 1 (Sublinear Comparison Principle). Assume that k = pn for p E (0,1]. Let Q be 
uniformly distributed on the Stiefel manifold V^, and let G € M"^'^ be a matrix with independent 
N(0,n~-^) entries. For each nonnegative, sublinear, convex function \-\ on M"^^ and each weakly 
increasing, convex function <1> : M — )• M, 

E$(|Q|)<Eci>((l + p/2)|G|). 

In particular, for all k <n, 

E$(|Q|) < E$(1.5|G|). 

Note that the leading constant in the bound is asymptotic to one when k = o(n). Conversely, 
Section [2] identifies situations where the leading constant must be at least one. We establish 
Theorem [1] in Section [3] as a consequence of a more comprehensive result, Theorem [5l for convex 
functions of a random A;-frame. 

A simple example suffices to show that Theorem [1] does not admit a matching lower bound, no 
matter what comparison factor (3 we allow. Indeed, suppose that we fix a positive number (3. Write 
||-|| for the spectral norm (i.e., the operator norm between two Hilbert spaces), and consider the 
weakly increasing, convex function 

$(t) := ((t)+ - 1)^„ where (a)+ := max{0,a}. 

For a normalized Gaussian matrix G € M"^^', we compute that 

E$(/5||G||) =E(/3||G|| -1)+ >0 

because there is always a positive probability that /? ||G|| > 2. Meanwhile, the spectral norm of a 
random A;- frame Q in M" satisfies ||Q|| = 1, so 

E^dlQII) =E$(1) = 0. 

Inexorably, 

E$(/3||G||)<E$(||Q||) =^ /3<0. 

Therefore, it is impossible to control $(/3 |G|) using $(|Q|) unless we impose additional restrictions. 
Turn to Section H] for some conditions under which we can reverse the comparison in Theorem [TJ 

One of the anonymous referees has made a valuable point that deserves amplification. Note that a 
random orthogonal matrix with dimension one is a scalar Rademacher variable, while a normalized 
Gaussian matrix with dimension one is a scalar Gaussian variable. From this perspective, Theorem[T] 
resembles a noncommutative version of the classical comparison between Rademacher series and 
Gaussian series in a Banach space |LT91l Sec. 4.2]. Let us state an extension of Theorem [T] that 
makes this connection explicit. 



A COMPARISON PRINCIPLE FOR RANDOM SUBSPACES 3 

Theorem 2 (Noncommutative Gaussian Comparison Principle). Fix a sequence of square matrices 
{Aj : j = 1,...,J} C M"^". Consider an independent family {Qj : j = 1, . . . , J} C M"^" of 
random orthogonal matrices, and an independent family {Gj : j = 1, . . . , J} C M"^" of normalized 
Gaussian matrices. For each nonnegative, sublinear, convex function \-\ on M"^" and each weakly 
increasing, convex function $ : M — )■ R, 



< E$ 1.5 



We can complete the proof of Theorem [2] using an obvious variation on the arguments behind 
Theorem [TJ We omit further details out of consideration for the reader's patience. 

2. A Few Examples 

Before proceeding with the proof of Theorem [H we present some applications that may be 
interesting. We need the following result jLT9H Thm. 3.20], which is due to Gordon jGorSSj . 

Proposition 3 (Spectral Norm of a Gaussian Matrix). Let G G M"^*"' be a random matrix with 
independent N^OjU"^) entries. Then 



E||G||^fc^^„ < 1 + ^/k/n. 



•-2 '^2 



2.1. How good are the constants? Consider a uniformly random orthogonal matrix Q € V". 
Evidently, its spectral norm ||Q|| = 1. Let G G M"^" be a normalized Gaussian matrix. Theorem[T] 
and Proposition [3] ensure that 

1 = E||Q|| < 1.5E||G|| < 3. 
Thus, the constant 1.5 in Theorem [1] cannot generally be improved by a factor greater than three. 

Next, we specialize to the trivial case where k = n = 1. Let Q be a Rademacher random variable, 
and let G be a standard Gaussian random variable. Theorem [T] implies that 

1 = E|Q| < 1.5E|G| = 1.5^J^< 1.2. 

Therefore, we cannot improve the constant by a factor of more than 1.2 if we demand a result that 
holds when n is small. 

Finally, consider the case where k = 1. Let g be a random unit vector in M", and let g he a 
vector in with independent N(0,n~^) entries. Applying Theorem [1] with the Euclidean norm, 
we obtain 

— ) •EllolL < 1 + — . 

This example demonstrates that the best constant in Theorem [T] is at least one when k = 1 and n 
is large. Related examples show that the best constant is at least one as long as A; = o(n). 



l = E||q||2< 1 + 7^ •E||5||2<1 + 



2.2. Maximum entry of a random orthogonal matrix. Consider a uniformly random orthog- 
onal matrix Q £ V^, and let G £ M"^" be a normalized Gaussian matrix. Using Theorem [T] and 
a standard bound for the maximum of standard Gaussian variables, we estimate that 



,^ , ,^ , /21og(n2) + l /log(n) + l/4 
EmaxIQijI < 1.5Emax|Gij| < 1.5W — — = 3\ — 

i,j i,j \ n \ n 

Jiang |Jia05j has shown that, almost surely, a sequence {Q^"^} of random orthogonal matrices with 
g ]2as the limitng behavior 



lim inf . / • max I Q ."^ I = 2 and lim sup . ■ max I Q -"^ I = VQ 

n^oo Y logn i,j ■> n^oo \ log n i,j ^ 

We see that our simple estimate is not sharp, but it is very reasonable. 
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2.3. Spectral norm of a submatrix of a random /c-frame. Consider a uniformly random k- 
frame Q G V^, and let G G M"^'^ be a normalized Gaussian matrix. Define the linear map 
that restricts an n x A; matrix to its first j rows and rescales it by -sjnjj. As a consequence, the 
columns of the j x k matrix ^j{Q) approximately have unit Euclidean norm. We may compute 
that 

E||^j(Q)|| < {l + {k/2n))E\\^j{G)\\ < {l + {k/2n)){l + y^) 

because of Theorem [1] and Proposition [3l 

This estimate is interesting because it applies for all values of j and k. Note that the leading con- 
stant H-(fc/2n) is asymptotic to one whenever k = o(n). In contrast, we recall Jiang's result |Jia06j 
that the total- variation distance between the distributions of J^j{Q) and ^j{G) vanishes if and 
only if J, A; = o{^/n). A related fact is that, under a natural coupling of Q and G, the matrix 
^oo-norm distance between ^j{Q) and ^j{G) vanishes in probability if and only \ik = o(n/ log n). 

3. Proof of the Sublinear Comparison Principle 

The main tool in our proof is a well-known theorem of Bartlett that describes the statistical 
properties of the QR decomposition of a standard Gaussian matrix, i.e., a matrix with independent 
N(0, 1) entries. See Muirhead's book |Mui82j for a detailed derivation of this result. 

Proposition 4 (The Bartlett Decomposition). Assume that k < n, and let V G M"^'^ he a standard 
Gaussian matrix. Then 

The factors Q and R are statistically independent. The matrix Q is uniformly distributed on the 
Stiefel manifold V^. The matrix R is a random upper-triangular matrix of the form 



R 



Xi Yi2 Yi3 ... Yik 

X2 Y2Z . . . Y2k 

Xk-i Yk-i^k 

where the diagonal entries Xf ~ Xn-i+i ^'^'^ super- diagonal entries Yij ~ N(0, 1); furthermore, 
all these random variables are mutually independent. 

We may now establish a comparison principle for a general convex function of a random A:-frame. 

Theorem 5 (Convex Comparison Principle). Assume that k < n. Let Q G M"^^' be uniformly 

jn 
k' 



distributed on the Stiefel manifold V^, and let V G M"^'^ be a standard Gaussian matrix. For each 



convex function f : M"^^ M, it holds that 

E/(Q) < E/(a~ir) where a := a{k,n) ■.= \^^ mXA 

k ^-^1=1 

and Xf ~ Xn-i+i- Similarly, for each concave function g : M"^'^' — )• M, it holds that 

Eg{Q)>Eg{a-'T). 

Proof. The result is a direct consequence of the Bartlett decomposition and Jensen's inequality. 
Define F, Q, and R as in the statement of Proposition HI Let P G M'^^^' be a uniformly random 
permutation matrix, independent from everything else. 
First, observe that 

E(Pi^P^) = (Etr(i^)) • I = al where a := | V'' E(Xi). 

k ^-^1=1 
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The symbol tr denotes the normahzed trace, and the random variable Xi ~ Xn-i+i each index 
i = 1, . . . ,k. Since the function / is convex, Jensen's inequality allows that 



EfiQ) = Ef{a-'^Q{EPRP'^)) < E f{a-^QPRP'^). 

It remains to simplify the random matrix in the last expression. 

Recall that the Haar distribution on the Stiefel manifold and the normalized Gaussian dis- 
tribution on M"^*^ are both invariant under orthogonal transformations. Therefore, Q ~ QS and 
r ~ TS^ for each fixed permutation matrix S. It follows that 

E[f{a-^QPRP^) I P] = E[f{a-^QRP'^) \ P] = E[f{a"^TP^) \ P] = Ef{a-^T), 

where we have also used the fact that Q and R are statistically independent. Combining the last 
two displayed formulas with the tower property of conditional expectation, we reach 

Ef{Q) < EE[f{a-^QPRP^)\P] = Ef{a-^T). 

The proof for concave functions is analogous. □ 

For Theorem [5] to be useful, we need to make some estimates for the constant a{k,n) that arises 
in the argument. To that end, we state without proof a simple result on the moments of a chi-square 
random variable. 

Proposition 6 (Clii-Square Moments). Let be a chi-square random variable with p degrees of 
freedom. Then 

^ V2-r((p + i)/2) 
r(p/2) 

Given the identity from Proposition [6l standard inequalities for this ratio of gamma functions 
allow us to estimate the constant a in terms of elementary operations and radicals. 

Lemma 7 (Estimates for the Constant). The constant a{k,n) defined in Theorem\^ satisfies 

\ EL'o V" - + 1/2) < a{Kn) < i^^J^'V^. 
Proof. We require bounds for 

« = t: 1 ^(^*) ^1^^^^ ~ ^ 

k ^ — 

Proposition [6] states that 

V2-V{{pi + l)/2) 

^{Xi) = ioTpi = n-i + l. 

r(Pi/2) 

This ratio of gamma functions appears frequently, and the following bounds are available. 

; \/2-r((p + 1)/2) ^ , 

^/¥^^<- r(ff/2) < forp>l/2. 

Combine these relations and reindex the sums to reach the result. 

The upper bound can be obtained directly from Jensen's inequality and the basic properties of 
a chi-square variable: E{Xi) < \E(Xf)]^^'^ = \/n — i + 1. In contrast, the lower bound seems to 
require hard analysis. □ 

For practical purposes, it is valuable to simplify the estimates from Lemma [7] even more. To 
accomplish this task, we interpret the sums in terms of basic integral approximations. 
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Lemma 8 (Simplified Estimates). The constant a{k,n) defined in Theorem\^ satisfies 



3k 



n 



_ (n - A:)3/2l < a{k,n) < 



2 

3k 



n 



3/2 



in 



k) 



3/2 



+ 



2k 



n 



\Jn — k 



The minimum value for the lower bound occurs when k = n, and 



2 

-Vn 



< a(n,n) 



< -^ + o(l) 



as n 



oo. 



Furthermore, when we express k = pn for p G (0, 1], it holds that 

< — .(l + p/2). 



a{pn, n) 



n 



Proof. Fix the parameters k and n. Define the real-valued function h{x) = \Jn — x, and observe 
that h is concave and decreasing on its natural domain. The lower bound for a from Lemma [7] 
implies that 



"slEL>(' + i/2)>i 



h{x) dx. 



To justify the second inequality, we observe that the sum corresponds with the midpoint-rule ap- 
proximation to the integral. Because the integrand is concave, the midpoint rule must overestimate 
the integral. Evaluate the integral to obtain the stated lower bound. 

To see that the minimum value for the lower bound occurs when k = n, notice that 

ffc 

k I — > — I h{x) dx 







is the running average of a decreasing function. Of course, the running average also decreases. 
Next, we use the relation k = pn to simplify (the reciprocal of) the lower bound, which yields 

1 



< 1.5 • • 



P 



<n~^'^-{l + p/2). 



a{pn,n) " 1 — (1 — p)3/2 

The second inequality holds because the fraction is a convex function of p on the interval (0, 1], so 
we may bound it above by the chord pi—)- (2 + p)/3 connecting the endpoints. 

The proof of the upper bound follows from a related principle: The trapezoidal rule underesti- 
mates the integral of a concave function. Lemma [7] ensures that 



1 ^fc-i , ^ ^ 1 



h{x) dx + - {h{0) - h{k)) 
2 



Here, we have applied the trapezoidal rule on the interval [0, k] and then redistributed the terms 
associated with the endpoints. Evaluate the integral to complete the bound. □ 

We are now prepared to establish the main result. 

Proof of TheoremUl Let Q be a random matrix distributed uniformly on the Stiefel manifold V^, 
and let G G M"^'^ be a normalized Gaussian matrix. We can write G = n~^/'^T where F is a 
standard normal matrix. 

Suppose that |-| is a nonnegative, sublinear, convex function and that $ is a weakly increasing, 
convex function. Then the function M i— )• <^(|iW|) is also convex. Theorem [S] demonstrates that 

E$(|Q|) <E$(|a~^F|) =E^{a~'^^/^■\G\). 

For k = pn, Lemma [8] ensures that the constant a satisfies 

a-^^/^< l + p/2. 

Given that the function $ is increasing and |G| > 0, we conclude that 

E$(|Q|) < ^(a~^^-\G\) < $((l + p/2) • |G|). 
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This argument establishes the main part of the theorem. To estabhsh the remaining assertion, we 
simply assign p = 1, the maximum value allowed. □ 

4. Partial Converses 

There are at least a few situations where it is possible to reverse the inequality of Theorem [TJ 
To develop these results, we record another basic observation about Gaussian matrices |Mui82j . 

Proposition 9 (Polar Factorization). Assume that k < n. Let T G M"^'^ he a standard Gaussian 
matrix. Then 

^nxk ~ QnxkWkxk- 

The factors Q and W are statistically independent. The matrix Q is uniformly distributed on the 
Stiefel manifold V^, and the matrix W is the positive square root of a k x k Wishart matrix with 
n degrees of freedom. 

The first converse concerns a right operator ideal norm; that is, a norm |||-||| that satisfies the 
relation |||AS||| < |||A||| • ||S|| , where ||-|| is the spectral norm. 

Theorem 10 (Partial Converse I). Assume that k = pn for p G (0,1]. Let Q be uniformly 
distributed on the Stiefel manifold Y^, and let G £ M"^'^ be a normalized Gaussian matrix. For 
each right operator ideal norm |||-|||, it holds that 

E|||G||| < (1 + ^) •EIQIII . 

Proof. The proof uses the polar factorization of the Gaussian matrix described in Proposition [9l 
For a standard Gaussian matrix T G M"^^, 

E|||G||| =n-i/2E|||r||| =n~i/2]E|||Q^||| < ^-1/2 ]e(|||q||| . \\W\\) = n~'^/^{K\lQl\) ■ (E||-H^||). 

The last relation relies on the independence of the polar factors. To continue, we note that the 
Wishart square root W has the same distribution as (r*r)^/^. Therefore, 

n"^/2]E|||^|| = n'^/^E^ = n''^^'^ E\\r\\ = E ||G|| < 1 + y^kj^, 

where the last bound follows from Gordon's result, Proposition [31 □ 

A version of Theorem [10] also holds for higher moments: 

EdllGI") < E(||G|r) • EdllQI") < CV^ • (1 + Vp) • lE(IIIQIir') when m > 1. 

The second inequality holds because moments of a Gaussian series are equivalent |LT9H Cor. 3.2]. 

We have a second result that holds for other types of operator norms. We omit the proof, which, 
by now, should be obvious. 

Theorem 11 (Partial Converse II). Assume that k < n. Let Q be uniformly distributed on the 
Stiefel manifold Y]^, and let G G M'^^" be a normalized Gaussian matrix. Suppose that is a 
norm on R*"' and is a norm on M". Then 

^\\G\\y^z < (n"^/'E||T||^^^) • (EIIQII^^^) 

where T is either the upper-triangular matrix R defined in Proposition or the Wishart square 
root W defined in Proposition^^ 
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